6 research outputs found

    Algorithms and Adaptivity Gaps for Stochastic k-TSP

    Get PDF
    Given a metric (V,d)(V,d) and a rootV\textsf{root} \in V, the classic \textsf{k-TSP} problem is to find a tour originating at the root\textsf{root} of minimum length that visits at least kk nodes in VV. In this work, motivated by applications where the input to an optimization problem is uncertain, we study two stochastic versions of \textsf{k-TSP}. In Stoch-Reward kk-TSP, originally defined by Ene-Nagarajan-Saket [ENS17], each vertex vv in the given metric (V,d)(V,d) contains a stochastic reward RvR_v. The goal is to adaptively find a tour of minimum expected length that collects at least reward kk; here "adaptively" means our next decision may depend on previous outcomes. Ene et al. give an O(logk)O(\log k)-approximation adaptive algorithm for this problem, and left open if there is an O(1)O(1)-approximation algorithm. We totally resolve their open question and even give an O(1)O(1)-approximation \emph{non-adaptive} algorithm for this problem. We also introduce and obtain similar results for the Stoch-Cost kk-TSP problem. In this problem each vertex vv has a stochastic cost CvC_v, and the goal is to visit and select at least kk vertices to minimize the expected \emph{sum} of tour length and cost of selected vertices. This problem generalizes the Price of Information framework [Singla18] from deterministic probing costs to metric probing costs. Our techniques are based on two crucial ideas: "repetitions" and "critical scaling". We show using Freedman's and Jogdeo-Samuels' inequalities that for our problems, if we truncate the random variables at an ideal threshold and repeat, then their expected values form a good surrogate. Unfortunately, this ideal threshold is adaptive as it depends on how far we are from achieving our target kk, so we truncate at various different scales and identify a "critical" scale.Comment: ITCS 202

    Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

    Full text link
    Knowledge distillation is one of the primary methods of transferring knowledge from large to small models. However, it requires massive task-specific data, which may not be plausible in many real-world applications. Data augmentation methods such as representation interpolation, token replacement, or augmentation with models are applied to tackle this problem. However, these data augmentation methods either potentially cause shifts in decision boundaries (representation interpolation), are not expressive enough (token replacement), or introduce too much computational overhead (augmentation with models). To this end, we propose AugPro (Augmentation with Projection), an effective and efficient data augmentation method for distillation. Our method builds on top of representation interpolation augmentation methods to maintain the diversity of expressions and converts the augmented data to tokens to avoid shifting decision boundaries. It uses simple operations that come with little computational overhead. The results on multiple GLUE tasks show that our methods can improve distillation performance by a large margin at a low time cost. Codes are available at https://github.com/google-research/google-research/tree/master/augpro.Comment: 20 pages, 5 figures. Accepted by ICLR 202

    ReSQueing Parallel and Private Stochastic Convex Optimization

    Full text link
    We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in Rd\mathbb{R}^d, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error ϵopt\epsilon_{\text{opt}} with d1/3ϵopt2/3d^{1/3}\epsilon_{\text{opt}}^{-2/3} gradient oracle query depth and d1/3ϵopt2/3+ϵopt2d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2} gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For ϵopt[d1,d1/4]\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}], our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. Given nn samples of Lipschitz loss functions, prior works [BFTT19, BFGT20, AFKT21, KLL21] established that if ndϵdp2n \gtrsim d \epsilon_{\text{dp}}^{-2}, (ϵdp,δ)(\epsilon_{\text{dp}}, \delta)-differential privacy is attained at no asymptotic cost to the SCO utility. However, these prior works all required a superlinear number of gradient queries. We close this gap for sufficiently large nd2ϵdp3n \gtrsim d^2 \epsilon_{\text{dp}}^{-3}, by using ReSQue to design an algorithm with near-linear gradient query complexity in this regime
    corecore